QTM 350 - Data Science Computing

Lecture 03: Command Line Interface

Danilo Freire

Department of Quantitative Theory and Methods
Emory University

09 September, 2024

Recap and lecture overview 📚

Brief recap of last class

Early computing and data representation

  • Computers evolved from people to mechanical calculators to silicon-based machines
  • Modern computers use the Von Neumann architecture, storing both instructions and data in memory
  • Computers represent data using binary (base 2) numbers made up of 0s and 1s
  • A bit is a single binary digit; 8 bits make a byte
  • Hexadecimal (base 16) is a compact way to represent binary, with each hex digit corresponding to 4 bits
  • Abstraction allows representing complex data like images and text using numbers

Brief recap of last class

Representing images, colours and text

  • Images can be broken down into a grid of coloured pixels
  • Colours are represented using the RGB model, with each colour channel (red, green, blue) ranging from 0-255
  • 8-bit color uses 256 levels per channel, allowing for over 16 million possible colors
  • Text is broken into individual characters, with each character mapped to a number using an encoding like ASCII
  • ASCII is a simple lookup table mapping the numbers 0-255 to characters
  • Unicode extends ASCII to support accented characters and symbols from all languages

Brief recap of last class

Programming languages

  • Konrad Zuse created the first programmable computers and high-level programming language in the 1940s
  • Assembly allows writing human-readable instructions that map closely to machine code
  • High-level languages like Python abstract away hardware details and are more portable across systems
  • Low-level languages are harder to read and write but very fast and efficient
  • Compiled languages are converted to machine code before execution; interpreted languages are executed on the fly

Today’s lecture

Command line: the old school way of interacting with computers

  • Today, we will learn about the command line, a text-based interface to interact with computers
  • We will learn some basic commands to navigate the file system, create and delete files, and run programs
  • We will also learn about shell scripting, a way to automate tasks using the command line
  • The command line is still widely used in data science and programming, especially for remote servers, cloud computing, and automation

Questions? 🤓

What is the command line? 💻

A computer in a nutshell

Operating system

Credit Dave Kerr

  • The operating system (OS) is system software that interfaces with (and manages access to) a computer’s hardware. It also provides software resources
  • The OS is divided into the kernel and user space
  • The kernel is the core of the OS. It’s responsible for interfacing with hardware (drivers), managing resources etc. Running software in the kernel is extremely sensitive! That’s why users are kept away from it!
  • The user space provides an interface for users, who can run programs/applications on the machine. Hardware access of programmes (e.g., memory usage) is managed by the kernel. Programmes in user space are essentially in sandboxes, which sets a limit to how much damage they can do.

A computer in a nutshell

Kernels and shells

  • The shell is just a general name for any user space program that allows access to resources in the system, via some kind of interface
  • Shells come in many different flavours but are generally provided to aid a human operator in accessing the system. This could be interactively, by typing at a terminal, or via scripts, which are files that contain a sequence of commands
  • Modern computers use graphical user interfaces (GUIs) as the standard tool for human-computer interaction
  • Why “kernel” and “shell”? The kernel is the soft, edible part of a nut or seed, which is surrounded by a shell to protect it. Useful metaphor, innit?

Interacting with the shell

Terminals

Credit Dave Kerr

  • Things are still a bit more complicated
  • We’re not directly interacting with the “shell” but using a terminal
  • A terminal is just a program that reads input from the keyboard, passes that input to another programme, and displays the results on the screen
  • A shell program on its own does not do this - it requires a terminal as an interface
  • Why “terminal”? Back in the old days (before computer screen existed), terminal machines (hardware!) were used to let humans interface with large machines (“mainframes”). Often many terminals were connected to a single machine
  • When you want to work with a computer in a data center (or remotely in cloud computing), you’ll still do pretty much the same

Interacting with the shell

Command line

Credit Dave Kerr

  • Terminals are really quite simple - they’re just interfaces

  • The first thing that a terminal will do is run a shell - a programme we can use to operate the computer

  • Back to the shell: the shell usually takes input

    • Interactively from the user via the terminal’s command line
    • Executes scripts (without command line)
  • In interactive mode the shell then returns output

    • To the terminal where it is printed/shown
    • To files or other locations
  • The command line represents what is shown and entered in the terminal. They can be customised (e.g., with colour highlighting) to make interaction more convenient

Shell variants

Bash, Zsh, and others

  • It is important to note that there are many different shell programmes, and they differ in terms of functionality
  • On most Unix-like systems, the default shell is a program called bash, which stands for “Bourne Again Shell”
  • Other examples are the Z Shell (or zsh; default on MacOS), Windows Command Prompt (cmd.exe, the default CLI on MS Windows), Windows PowerShell, C Shell, and many more
  • When a terminal opens, it will immediately start the user’s preferred shell programme. (This can be changed.)

Why bother with the shell? 🤷

Why bother with the shell?

Why should you use this…

… instead of this?

Why bother with the shell?

The programmer’s best friend

  1. Speed. Typing is fast: A skilled shell user can manipulate a system at dazzling speeds just using a keyboard. Typing commands is generally much faster than exploring through user interfaces with a mouse.

  2. Power. Both for executing commands and for fixing problems. There are some things you just can’t do in an IDE or GUI. It also avoids memory complications associated with certain applications and/or IDEs.

  3. Reproducibility. Scripting is reproducible, while clicking is not.

  4. Portability. A shell can be used to interface to almost any type of computer, from a mainframe to a Raspberry Pi, in a very similar way. The shell is often the only game in town for high performance computing (interacting with servers and super computers).

  5. Automation. Shells are programmable: Working in the shell allows you to program workflows, that is create scripts to automate time-consuming or repetitive processes.

  6. Become a marketable data scientist. Modern programming is often polyglot. The shell provides a common interface for tooling. Modern solutions are often built to run in containers on Linux. In this environment shell knowledge has become very valuable. In short, the shell is having a renaissance in the age of data science.

The Unix philosophy

The Unix philosophy

The shell tools that we’re going to be using have their roots in the Unix family of operating systems originally developed at Bells Labs in the 1970s.

Besides paying homage, acknowledging the Unix lineage is important because these tools still embody the “Unix philosophy”:

Do One Thing And Do It Well

By pairing and chaining well-designed individual components, we can build powerful and much more complex larger systems.

You can see why the Unix philosophy is also referred to as “minimalist and modular”.

Again, this philosophy is very clearly expressed in the design and functionality of the Unix shell.

Things to use the shell for

  • Navigating the file system
  • Version control with Git
  • Renaming and moving files
  • Finding things on your computer
  • Writing and running code
  • Installing and updating software
  • Monitoring system resources
  • Connecting to cloud environments
  • Running analyses (“jobs”) on super computers
  • … and much more!

Shell basics 🐚 🤓

Shell: First look

Let’s open up our shell!

A convenient way to do this is through VS Code’s built-in Terminal.

Click on the View menu, then Terminal. You can also use the shortcut Ctrl + ` (backtick).

Your system default shell is loaded. To find out what that is, type echo $SHELL in the terminal.

{bash, echo=TRUE} echo $SHELL

It is Z shell in my case

… what about you? It is your turn to find out!

Your turn!

Of course, it’s always possible to open up the shell directly if you prefer. It’s your turn!

Feel free to check our class tutorial on how to set up your shell in VS Code.

Open your terminal and type the following commands (without the $):

{bash, eval=FALSE, echo=TRUE} echo $SHELL whoami pwd mkdir new-folder cd .. ls man ls # type 'j' to scroll down, 'k' to scroll up, 'q' to quit

Share your results with a colleague (or the class)!

Shell: First look

You should see something like:

{bash, eval=FALSE, echo=TRUE} username@hostname:~$

This is shell-speak for: “Who am I and where am I?”

  • username denotes a specific user (one of potentially many on this computer).

  • @hostname denotes the name of the computer or server.

  • :~ denotes the directory path (where ~ signifies the user’s home directory).

  • $ (or maybe %) denotes the start of the command prompt.

    • (For a special “superuser” called root, the dollar sign will change to a #).

{bash, echo=TRUE} whoami pwd

Syntax

Syntax

All bash commands have the same basic syntax:

command option(s) argument(s)

Examples:

```{bash, eval=FALSE, echo=TRUE} # list files in the Documents directory # with human-readable sizes

ls -lh ~/Documents




<br>



```{bash, eval=FALSE, echo=TRUE}
# sort the file and remove duplicates

sort -u file.txt 

Commands

  • You don’t always need options or arguments

  • For example:

    • ls ~/Documents/ and ls -lh ~/Documents are both valid commands that will yield (different) output
  • However, you always need a command.

Syntax

All Bash commands have the same basic syntax:

command option(s) argument(s)

Examples:

```{bash, eval=FALSE, echo=TRUE} # list files in the Documents directory # with human-readable sizes

ls -lh ~/Documents




<br>



```{bash, eval=FALSE, echo=TRUE}
# sort the file and remove duplicates

sort -u file.txt 

Options (also called Flags)

  • Start with a dash. Usually one letter.

  • Multiple options can be chained under a single dash.

    {bash, eval=FALSE, echo=TRUE} ls -l -a -h /var/log # This works ls -lah /var/log # So does this

  • An exception is with (rarer) options requiring two dashes.

    {bash, eval=FALSE, echo=TRUE} ls --group-directories-first --human-readable /var/log

  • l: Use a long listing format. This option shows detailed information about the files and directories

  • h: With -l, print sizes in human-readable format (e.g., KB, MB)

  • u: Unique, it filters out the duplicate entries in the output

  • Think it’s difficult to memorize what the individual letters stand for? You’re totally right!

Syntax

All Bash commands have the same basic syntax:

command option(s) argument(s)

Examples:

{bash, eval=FALSE, echo=TRUE} $ ls -lh ~/Documents/


{bash, eval=FALSE, echo=TRUE} $ sort -u file.txt

Arguments

  • Tell the command what to operate on.

  • Totally depends on the command what legit inputs are.

  • Can be a file, path, a set of files and folders, a string, and more

  • Sometimes more than just one argument is needed:

    {bash, eval=FALSE, echo=TRUE} mv figs/cat.png best-figs/cat02.png

Help! 🆘 😟

Multiple ways to get help

  • The man tool can be used to look at the manual page for a topic.

  • The man pages are grouped into sections, we can see them with man man.

  • The cht.sh website can be used directly from the shell to get help on tools. Run it like this: curl cht.sh/command

Multiple ways to get help

  • You can also install the tldr tool which provides simplified help pages for common commands. Run it like this: tldr command

{bash, echo=TRUE} tldr ls

  • For more info on how to get help, see here.

Getting help with man

The man command (“manual pages”) is your friend if you need help.

{bash, echo=TRUE} man ls

Getting help with man

Manual pages are shown in the shell. Here are the essentials to navigate through contents presented in the pager:

  • d - Scroll down half a page
  • u - Scroll up half a page
  • j / k - Scroll down or up a line. You can also use the arrow keys for this
  • q - Quit
  • /pattern - Search for text provided as “pattern”
  • n - When searching, find the next occurrence
  • N - When searching, find the previous occurrence
  • These and other man tricks are detailed in the help pages (hit “h” when you’re in the pager for an overview).

RTFM!

Always check the documentation!

man page explorer challenge

Partner up and choose a command from the list below. Use man to complete these tasks:

{bash, eval=FALSE, echo=TRUE} # Choose one: ls, cd, cp, mv, rm, mkdir, rmdir, touch, cat, find

  1. Summarise the command’s purpose in one sentence.
  2. Find an interesting option and explain what it does.
  3. Create an example using your command with at least two options.
  4. Bonus: Combine your command with your partner’s in a single line.

You have about 5 minutes. Be ready to share your findings!

Reflection: How was using man compared to online searches? How might you use it in future projects?

More navigation commands: A cheat sheet

  • ls (list): Show files and directories in the current directory
  • ls -l: Long listing format with detailed information
  • ls -a: Show hidden files (those starting with a dot)
  • ls -lh: Long listing format with human-readable sizes
  • ls -R: List subdirectories recursively
  • pwd (print working directory): Show the current directory path
  • cd (change directory): Change the current working directory
  • cd -: Go back to the previous directory
  • .: Refers to the current directory
  • ..: Refers to the parent directory
  • ~: Refers to the home directory
  • mkdir: Create a new directory
  • touch: Create a new empty file or update timestamps
  • cp: Copy files or directories
  • mv: Move or rename files or directories
  • rm: Remove files (use with caution!)
  • rmdir: Remove empty directories
  • cat: Display file contents
  • find: Search for files and directories

For a more detailed overview, click here

Shell navigation exercise

Follow these steps to practice using basic shell commands. Type each command and observe the results.

  1. Open your terminal and navigate to your home directory: cd ~
  2. Create a new directory called “practice” and change into it: mkdir practice && cd practice
  3. Create two empty files called “file1.txt” and “file2.txt”: touch file1.txt file2.txt
  4. List the contents of the directory: ls
  5. Move file2.txt to a new name (rename), file3.txt: mv file2.txt file3.txt
  6. List the contents again to verify the change, then return to the home directory: ls && cd ~
  7. Remove the “practice” directory and its contents: rm -r practice
  8. Verify that the directory has been removed: ls

Shell navigation exercise to try at home

  1. Open your terminal and navigate to your home directory: cd ~
  2. Create a new directory called “shell_practice”: mkdir shell-practice
  3. Change into the new directory: cd shell-practice
  4. Create three empty files called file1.txt, file2.txt, and file3.txt: touch file1.txt file2.txt file3.txt
  5. List the contents of the directory: ls
  6. Create a subdirectory called “subdir”: mkdir subdir
  1. Move file2.txt into the subdirectory: mv file2.txt subdir/
  2. Copy file1.txt to a new file called file4.txt: cp file1.txt file4.txt
  3. List the contents of the current directory and the subdirectory: ls -R
  4. Change to the parent directory: cd ..
  5. Remove the entire shell_practice directory and its contents: rm -r shell-practice
  6. Verify that the directory has been removed: ls

Bonus: Try using the man command to learn more about any of the commands you’ve used.

  • Were there any commands that surprised you?
  • Which commands did you find most useful?

Summary

Today we…

  • Explored the command line’s role in data science and programming
  • Discussed the Unix philosophy and the significance of the shell
  • Covered basic shell commands like pwd, ls, and cd for file system navigation
  • Introduced special symbols such as ~, ., and .. for directory navigation
  • Practiced executing these commands in the shell environment

Next class

  • We will learn a bit more about the command line, especially about text processing and scripting
  • We will also learn about how to use vim or neovim as a text editors
    • Vim is a powerful text editor that is highly configurable and can be used for many different programming languages. And it is my editor of choice! 🤓
  • After that, we will introduce Git and GitHub for version control and collaboration

Questions? 😉

Thank you very much and see you next class! 😊 🙏🏼